# High-Precision Description
Longva 7B TPO
MIT
LongVA-7B-TPO is a video-text model derived from LongVA-7B through temporal preference optimization, excelling in long video understanding tasks.
Video-to-Text
Transformers

L
ruili0
225
1
Cogflorence 2.2 Large
MIT
This model is a fine-tuned version of microsoft/Florence-2-large, trained on a 40,000-image subset of the Ejafa/ye-pop dataset, with annotation texts generated by THUDM/cogvlm2-llama3-chat-19B, suitable for image-to-text tasks.
Image-to-Text
Transformers Supports Multiple Languages

C
thwri
20.64k
33
Git Base Next Refined
MIT
Fine-tuned image-to-text model based on microsoft/git-base
Large Language Model
Transformers Other

G
swaroopajit
24
0
Featured Recommended AI Models